10 resultados para logic tree, logicFS, Monte Carlo logic regression, genetic programming for association study, random forest, GENICA

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A better understanding of the limiting step in a first order phase transition, the nucleation process, is of major importance to a variety of scientific fields ranging from atmospheric sciences to nanotechnology and even to cosmology. This is due to the fact that in most phase transitions the new phase is separated from the mother phase by a free energy barrier. This barrier is crossed in a process called nucleation. Nowadays it is considered that a significant fraction of all atmospheric particles is produced by vapor-to liquid nucleation. In atmospheric sciences, as well as in other scientific fields, the theoretical treatment of nucleation is mostly based on a theory known as the Classical Nucleation Theory. However, the Classical Nucleation Theory is known to have only a limited success in predicting the rate at which vapor-to-liquid nucleation takes place at given conditions. This thesis studies the unary homogeneous vapor-to-liquid nucleation from a statistical mechanics viewpoint. We apply Monte Carlo simulations of molecular clusters to calculate the free energy barrier separating the vapor and liquid phases and compare our results against the laboratory measurements and Classical Nucleation Theory predictions. According to our results, the work of adding a monomer to a cluster in equilibrium vapour is accurately described by the liquid drop model applied by the Classical Nucleation Theory, once the clusters are larger than some threshold size. The threshold cluster sizes contain only a few or some tens of molecules depending on the interaction potential and temperature. However, the error made in modeling the smallest of clusters as liquid drops results in an erroneous absolute value for the cluster work of formation throughout the size range, as predicted by the McGraw-Laaksonen scaling law. By calculating correction factors to Classical Nucleation Theory predictions for the nucleation barriers of argon and water, we show that the corrected predictions produce nucleation rates that are in good comparison with experiments. For the smallest clusters, the deviation between the simulation results and the liquid drop values are accurately modelled by the low order virial coefficients at modest temperatures and vapour densities, or in other words, in the validity range of the non-interacting cluster theory by Frenkel, Band and Bilj. Our results do not indicate a need for a size dependent replacement free energy correction. The results also indicate that Classical Nucleation Theory predicts the size of the critical cluster correctly. We also presents a new method for the calculation of the equilibrium vapour density, surface tension size dependence and planar surface tension directly from cluster simulations. We also show how the size dependence of the cluster surface tension in equimolar surface is a function of virial coefficients, a result confirmed by our cluster simulations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A better understanding of the limiting step in a first order phase transition, the nucleation process, is of major importance to a variety of scientific fields ranging from atmospheric sciences to nanotechnology and even to cosmology. This is due to the fact that in most phase transitions the new phase is separated from the mother phase by a free energy barrier. This barrier is crossed in a process called nucleation. Nowadays it is considered that a significant fraction of all atmospheric particles is produced by vapor-to liquid nucleation. In atmospheric sciences, as well as in other scientific fields, the theoretical treatment of nucleation is mostly based on a theory known as the Classical Nucleation Theory. However, the Classical Nucleation Theory is known to have only a limited success in predicting the rate at which vapor-to-liquid nucleation takes place at given conditions. This thesis studies the unary homogeneous vapor-to-liquid nucleation from a statistical mechanics viewpoint. We apply Monte Carlo simulations of molecular clusters to calculate the free energy barrier separating the vapor and liquid phases and compare our results against the laboratory measurements and Classical Nucleation Theory predictions. According to our results, the work of adding a monomer to a cluster in equilibrium vapour is accurately described by the liquid drop model applied by the Classical Nucleation Theory, once the clusters are larger than some threshold size. The threshold cluster sizes contain only a few or some tens of molecules depending on the interaction potential and temperature. However, the error made in modeling the smallest of clusters as liquid drops results in an erroneous absolute value for the cluster work of formation throughout the size range, as predicted by the McGraw-Laaksonen scaling law. By calculating correction factors to Classical Nucleation Theory predictions for the nucleation barriers of argon and water, we show that the corrected predictions produce nucleation rates that are in good comparison with experiments. For the smallest clusters, the deviation between the simulation results and the liquid drop values are accurately modelled by the low order virial coefficients at modest temperatures and vapour densities, or in other words, in the validity range of the non-interacting cluster theory by Frenkel, Band and Bilj. Our results do not indicate a need for a size dependent replacement free energy correction. The results also indicate that Classical Nucleation Theory predicts the size of the critical cluster correctly. We also presents a new method for the calculation of the equilibrium vapour density, surface tension size dependence and planar surface tension directly from cluster simulations. We also show how the size dependence of the cluster surface tension in equimolar surface is a function of virial coefficients, a result confirmed by our cluster simulations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bertrand Russell (1872 1970) introduced the English-speaking philosophical world to modern, mathematical logic and foundational study of mathematics. The present study concerns the conception of logic that underlies his early logicist philosophy of mathematics, formulated in The Principles of Mathematics (1903). In 1967, Jean van Heijenoort published a paper, Logic as Language and Logic as Calculus, in which he argued that the early development of modern logic (roughly the period 1879 1930) can be understood, when considered in the light of a distinction between two essentially different perspectives on logic. According to the view of logic as language, logic constitutes the general framework for all rational discourse, or meaningful use of language, whereas the conception of logic as calculus regards logic more as a symbolism which is subject to reinterpretation. The calculus-view paves the way for systematic metatheory, where logic itself becomes a subject of mathematical study (model-theory). Several scholars have interpreted Russell s views on logic with the help of the interpretative tool introduced by van Heijenoort,. They have commonly argued that Russell s is a clear-cut case of the view of logic as language. In the present study a detailed reconstruction of the view and its implications is provided, and it is argued that the interpretation is seriously misleading as to what he really thought about logic. I argue that Russell s conception is best understood by setting it in its proper philosophical context. This is constituted by Immanuel Kant s theory of mathematics. Kant had argued that purely conceptual thought basically, the logical forms recognised in Aristotelian logic cannot capture the content of mathematical judgments and reasonings. Mathematical cognition is not grounded in logic but in space and time as the pure forms of intuition. As against this view, Russell argued that once logic is developed into a proper tool which can be applied to mathematical theories, Kant s views turn out to be completely wrong. In the present work the view is defended that Russell s logicist philosophy of mathematics, or the view that mathematics is really only logic, is based on what I term the Bolzanian account of logic . According to this conception, (i) the distinction between form and content is not explanatory in logic; (ii) the propositions of logic have genuine content; (iii) this content is conferred upon them by special entities, logical constants . The Bolzanian account, it is argued, is both historically important and throws genuine light on Russell s conception of logic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study addresses three important issues in tree bucking optimization in the context of cut-to-length harvesting. (1) Would the fit between the log demand and log output distributions be better if the price and/or demand matrices controlling the bucking decisions on modern cut-to-length harvesters were adjusted to the unique conditions of each individual stand? (2) In what ways can we generate stand and product specific price and demand matrices? (3) What alternatives do we have to measure the fit between the log demand and log output distributions, and what would be an ideal goodness-of-fit measure? Three iterative search systems were developed for seeking stand-specific price and demand matrix sets: (1) A fuzzy logic control system for calibrating the price matrix of one log product for one stand at a time (the stand-level one-product approach); (2) a genetic algorithm system for adjusting the price matrices of one log product in parallel for several stands (the forest-level one-product approach); and (3) a genetic algorithm system for dividing the overall demand matrix of each of the several log products into stand-specific sub-demands simultaneously for several stands and products (the forest-level multi-product approach). The stem material used for testing the performance of the stand-specific price and demand matrices against that of the reference matrices was comprised of 9 155 Norway spruce (Picea abies (L.) Karst.) sawlog stems gathered by harvesters from 15 mature spruce-dominated stands in southern Finland. The reference price and demand matrices were either direct copies or slightly modified versions of those used by two Finnish sawmilling companies. Two types of stand-specific bucking matrices were compiled for each log product. One was from the harvester-collected stem profiles and the other was from the pre-harvest inventory data. Four goodness-of-fit measures were analyzed for their appropriateness in determining the similarity between the log demand and log output distributions: (1) the apportionment degree (index), (2) the chi-square statistic, (3) Laspeyres quantity index, and (4) the price-weighted apportionment degree. The study confirmed that any improvement in the fit between the log demand and log output distributions can only be realized at the expense of log volumes produced. Stand-level pre-control of price matrices was found to be advantageous, provided the control is done with perfect stem data. Forest-level pre-control of price matrices resulted in no improvement in the cumulative apportionment degree. Cutting stands under the control of stand-specific demand matrices yielded a better total fit between the demand and output matrices at the forest level than was obtained by cutting each stand with non-stand-specific reference matrices. The theoretical and experimental analyses suggest that none of the three alternative goodness-of-fit measures clearly outperforms the traditional apportionment degree measure. Keywords: harvesting, tree bucking optimization, simulation, fuzzy control, genetic algorithms, goodness-of-fit

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examines the properties of Generalised Regression (GREG) estimators for domain class frequencies and proportions. The family of GREG estimators forms the class of design-based model-assisted estimators. All GREG estimators utilise auxiliary information via modelling. The classic GREG estimator with a linear fixed effects assisting model (GREG-lin) is one example. But when estimating class frequencies, the study variable is binary or polytomous. Therefore logistic-type assisting models (e.g. logistic or probit model) should be preferred over the linear one. However, other GREG estimators than GREG-lin are rarely used, and knowledge about their properties is limited. This study examines the properties of L-GREG estimators, which are GREG estimators with fixed-effects logistic-type models. Three research questions are addressed. First, I study whether and when L-GREG estimators are more accurate than GREG-lin. Theoretical results and Monte Carlo experiments which cover both equal and unequal probability sampling designs and a wide variety of model formulations show that in standard situations, the difference between L-GREG and GREG-lin is small. But in the case of a strong assisting model, two interesting situations arise: if the domain sample size is reasonably large, L-GREG is more accurate than GREG-lin, and if the domain sample size is very small, estimation of assisting model parameters may be inaccurate, resulting in bias for L-GREG. Second, I study variance estimation for the L-GREG estimators. The standard variance estimator (S) for all GREG estimators resembles the Sen-Yates-Grundy variance estimator, but it is a double sum of prediction errors, not of the observed values of the study variable. Monte Carlo experiments show that S underestimates the variance of L-GREG especially if the domain sample size is minor, or if the assisting model is strong. Third, since the standard variance estimator S often fails for the L-GREG estimators, I propose a new augmented variance estimator (A). The difference between S and the new estimator A is that the latter takes into account the difference between the sample fit model and the census fit model. In Monte Carlo experiments, the new estimator A outperformed the standard estimator S in terms of bias, root mean square error and coverage rate. Thus the new estimator provides a good alternative to the standard estimator.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Population dynamics are generally viewed as the result of intrinsic (purely density dependent) and extrinsic (environmental) processes. Both components, and potential interactions between those two, have to be modelled in order to understand and predict dynamics of natural populations; a topic that is of great importance in population management and conservation. This thesis focuses on modelling environmental effects in population dynamics and how effects of potentially relevant environmental variables can be statistically identified and quantified from time series data. Chapter I presents some useful models of multiplicative environmental effects for unstructured density dependent populations. The presented models can be written as standard multiple regression models that are easy to fit to data. Chapters II IV constitute empirical studies that statistically model environmental effects on population dynamics of several migratory bird species with different life history characteristics and migration strategies. In Chapter II, spruce cone crops are found to have a strong positive effect on the population growth of the great spotted woodpecker (Dendrocopos major), while cone crops of pine another important food resource for the species do not effectively explain population growth. The study compares rate- and ratio-dependent effects of cone availability, using state-space models that distinguish between process and observation error in the time series data. Chapter III shows how drought, in combination with settling behaviour during migration, produces asymmetric spatially synchronous patterns of population dynamics in North American ducks (genus Anas). Chapter IV investigates the dynamics of a Finnish population of skylark (Alauda arvensis), and point out effects of rainfall and habitat quality on population growth. Because the skylark time series and some of the environmental variables included show strong positive autocorrelation, the statistical significances are calculated using a Monte Carlo method, where random autocorrelated time series are generated. Chapter V is a simulation-based study, showing that ignoring observation error in analyses of population time series data can bias the estimated effects and measures of uncertainty, if the environmental variables are autocorrelated. It is concluded that the use of state-space models is an effective way to reach more accurate results. In summary, there are several biological assumptions and methodological issues that can affect the inferential outcome when estimating environmental effects from time series data, and that therefore need special attention. The functional form of the environmental effects and potential interactions between environment and population density are important to deal with. Other issues that should be considered are assumptions about density dependent regulation, modelling potential observation error, and when needed, accounting for spatial and/or temporal autocorrelation.